71 research outputs found

    Optimal Pose and Shape Estimation for Category-level 3D Object Perception

    Full text link
    We consider a category-level perception problem, where one is given 3D sensor data picturing an object of a given category (e.g. a car), and has to reconstruct the pose and shape of the object despite intra-class variability (i.e. different car models have different shapes). We consider an active shape model, where -- for an object category -- we are given a library of potential CAD models describing objects in that category, and we adopt a standard formulation where pose and shape estimation are formulated as a non-convex optimization. Our first contribution is to provide the first certifiably optimal solver for pose and shape estimation. In particular, we show that rotation estimation can be decoupled from the estimation of the object translation and shape, and we demonstrate that (i) the optimal object rotation can be computed via a tight (small-size) semidefinite relaxation, and (ii) the translation and shape parameters can be computed in closed-form given the rotation. Our second contribution is to add an outlier rejection layer to our solver, hence making it robust to a large number of misdetections. Towards this goal, we wrap our optimal solver in a robust estimation scheme based on graduated non-convexity. To further enhance robustness to outliers, we also develop the first graph-theoretic formulation to prune outliers in category-level perception, which removes outliers via convex hull and maximum clique computations; the resulting approach is robust to 70%-90% outliers. Our third contribution is an extensive experimental evaluation. Besides providing an ablation study on a simulated dataset and on the PASCAL3D+ dataset, we combine our solver with a deep-learned keypoint detector, and show that the resulting approach improves over the state of the art in vehicle pose estimation in the ApolloScape datasets

    Loc-NeRF: Monte Carlo Localization using Neural Radiance Fields

    Full text link
    We present Loc-NeRF, a real-time vision-based robot localization approach that combines Monte Carlo localization and Neural Radiance Fields (NeRF). Our system uses a pre-trained NeRF model as the map of an environment and can localize itself in real-time using an RGB camera as the only exteroceptive sensor onboard the robot. While neural radiance fields have seen significant applications for visual rendering in computer vision and graphics, they have found limited use in robotics. Existing approaches for NeRF-based localization require both a good initial pose guess and significant computation, making them impractical for real-time robotics applications. By using Monte Carlo localization as a workhorse to estimate poses using a NeRF map model, Loc-NeRF is able to perform localization faster than the state of the art and without relying on an initial pose estimate. In addition to testing on synthetic data, we also run our system using real data collected by a Clearpath Jackal UGV and demonstrate for the first time the ability to perform real-time global localization with neural radiance fields. We make our code publicly available at https://github.com/MIT-SPARK/Loc-NeRF

    Transport of topologically protected photonic waveguide on chip

    Full text link
    We propose a new design on integrated optical devices on-chip with an extra width degree of freedom by using a photonic crystal waveguide with Dirac points between two photonic crystals with opposite valley Chern numbers. With such an extra waveguide, we demonstrate numerically that the topologically protected photonic waveguide keeps properties of valley-locking and immunity to defects. Due to the design flexibility of the width-tunable topologically protected photonic waveguide, many unique on-chip integrated devices have been proposed, such as energy concentrators with a concentration efficiency improvement by more than one order of magnitude, topological photonic power splitter with arbitrary power splitting ratio. The topologically protected photonic waveguide with the width degree of freedom could be beneficial for scaling up photonic devices, which provides a new flexible platform to implement integrated photonic networks on chip.Comment: 19 pages, 5 figure

    Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs

    Full text link
    Humans are able to form a complex mental model of the environment they move in. This mental model captures geometric and semantic aspects of the scene, describes the environment at multiple levels of abstractions (e.g., objects, rooms, buildings), includes static and dynamic entities and their relations (e.g., a person is in a room at a given time). In contrast, current robots' internal representations still provide a partial and fragmented understanding of the environment, either in the form of a sparse or dense set of geometric primitives (e.g., points, lines, planes, voxels) or as a collection of objects. This paper attempts to reduce the gap between robot and human perception by introducing a novel representation, a 3D Dynamic Scene Graph(DSG), that seamlessly captures metric and semantic aspects of a dynamic environment. A DSG is a layered graph where nodes represent spatial concepts at different levels of abstraction, and edges represent spatio-temporal relations among nodes. Our second contribution is Kimera, the first fully automatic method to build a DSG from visual-inertial data. Kimera includes state-of-the-art techniques for visual-inertial SLAM, metric-semantic 3D reconstruction, object localization, human pose and shape estimation, and scene parsing. Our third contribution is a comprehensive evaluation of Kimera in real-life datasets and photo-realistic simulations, including a newly released dataset, uHumans2, which simulates a collection of crowded indoor and outdoor scenes. Our evaluation shows that Kimera achieves state-of-the-art performance in visual-inertial SLAM, estimates an accurate 3D metric-semantic mesh model in real-time, and builds a DSG of a complex indoor environment with tens of objects and humans in minutes. Our final contribution shows how to use a DSG for real-time hierarchical semantic path-planning. The core modules in Kimera are open-source.Comment: 34 pages, 25 figures, 9 tables. arXiv admin note: text overlap with arXiv:2002.0628

    Single charge control of localized excitons in heterostructures with ferroelectric thin films and two-dimensional transition metal dichalcogenides

    Full text link
    Single charge control of localized excitons (LXs) in two-dimensional transition metal dichalcogenides (TMDCs) is crucial for potential applications in quantum information processing and storage. However, traditional electrostatic doping method with applying metallic gates onto TMDCs may cause the inhomogeneous charge distribution, optical quench, and energy loss. Here, by locally controlling the ferroelectric polarization of the ferroelectric thin film BiFeO3 (BFO) with a scanning probe, we can deterministically manipulate the doping type of monolayer WSe2 to achieve the p-type and n-type doping. This nonvolatile approach can maintain the doping type and hold the localized excitonic charges for a long time without applied voltage. Our work demonstrated that ferroelectric polarization of BFO can control the charges of LXs effectively. Neutral and charged LXs have been observed in different ferroelectric polarization regions, confirmed by magnetic optical measurement. Highly circular polarization degree about 90 % of the photon emission from these quantum emitters have been achieved in high magnetic fields. Controlling single charge of LXs in a non-volatile way shows a great potential for deterministic photon emission with desired charge states for photonic long-term memory.Comment: 13 pages, 5 figure
    • …
    corecore